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ABSTRACT 

Recently, collaborative tagging systems have attracted more 
and more attention and have been widely applied in web 
systems. Tags provide highly abstracted information about 
personal preferences and item content, and are therefore po- 
tential to help in improving better personalized recommen- 
dations. In this paper, we propose a tag-based recommen- 
dation algorithm considering the personal vocabulary and 
evaluate it in a real-world dataset: Del.icio.us. Experimen- 
tal results demonstrate that the usage of tag information 
can significantly improve the accuracy of personalized rec- 
ommendations. 

Categories and Subject Descriptors 

H. 2.8 [Database Management]: Database Applications- 
Data mining; H.3.3 [Information Storage and Retrieval]: 

Information Search and Retrieval- Information filtering 

General Terms 

Algorithms, Experimentation 

Keywords 
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I. INTRODUCTION 

The exponential growth of web information has brought us 
into an information overload era: We face too much data 
and sources to be able to find out those most relevant and 
interesting for us. Evaluating all these alternatives by our- 
selves is not possible. As a consequence, an urgent problem 
is how to automatically find out the relevant items for us. 
Internet search engine [3] provides us a useful tool to find 
out those information and it achieves great success over the 
last decade. However, it does not take into account person- 
alized information and returns the same results for people 
with far different habits. Comparatively, recommender sys- 
tem [15], adopting knowledge discovery techniques to pro- 



vide personalized recommendations, is now considered to be 
the most promising way to efficiently gather the useful infor- 
mation. Thus far, recommender systems have successfully 
found applications in e-commerce [16], such as book recom- 
mendations in Amazon.com [TT], movie recommendations 
in Netflix.com [2], video recommendations in TiVo.com [I], 
and so on. 

One of the most prominent techniques of recommender sys- 
tems is Collaborative Filtering (CF), where a user is rec- 
ommended items that people with similar tastes and pref- 
erences liked in the past. Despite its success, the perfor- 
mance of CF is strongly limited by the sparsity data. Thus, 
a number of researches devoted to integrate additional in- 
formation, such as user profiles [10J . item content Q3] and 
attributes [21], to filter out possibly irrelevant recommen- 
dations. However, these applications are usually strongly 
restricted to respect personal privacy, or limited due to the 
lack of available content information. 

Collaborative tagging systems (CTSes), allowing users to 
freely assign tags to their collections, provide promising pos- 
sibility to better address the above issues. CTSes require no 
specific skills for user participating, thus can overcome the 
limitation of vocabulary domains and size, widen the seman- 
tic relations among items and eventually facilitate the emer- 
gence of folksonomy S]. In addition, tags can be treated as 
abstracted content of items. Especially, tags are given by 
users themselves and thus in somehow represent the per- 
sonal vocabulary and preferences. In this paper, we propose 
a tag-based recommendation algorithm to that takes into 
account the personal vocabulary. We use one benchmark 
data set, Del.icio.us, to evaluate our algorithm. Experimen- 
tal results demonstrate that the usage of tag information 
can significantly improve the accuracy of recommendations. 

The rest of this paper is organized as follows. Section 2 
reviews the related work. In Section 3 we introduce our 
proposed algorithm and report the experimental results. Fi- 
nally, we summarize this paper and outline some open issues 
for future research in Section 4. 



2. RELATED WORK 

Recently, many efforts have been addressed in understand- 
ing the structure, evolution [3] and usage patterns [7] of 
CTSes. A considerable number of algorithms are designed 
to recommend tags to users, which may be helpful for bet- 
ter organizing, discovering and retrieving items [91 1121 [20] . 



Table 1: Basic information of the data set. 



Value 


Description 


9,991 


number of users 


243,737 


number of items 


102,732 


number of tags 


1,257,908 


number of user-item relations 


4,391,073 


accumulative number of tags 



The current work focuses on a relevant yet different appli- 
cation of CTSes, that is, to provide personalized item rec- 
ommendations with the help of tag information. Schenkel et 
al. [T7] proposed an incremental threshold algorithm taking 
into account both the social ties among users and semantic 
relatedness of different tags, which performs remarkably bet- 
ter than the algorithm without tag expansion. Nakamoto et 
al. [13] created a tag-based contextual collaborative filter- 
ing model, where the tag information is treated as the users' 
profiles. Tso-Sutter et al. [22] proposed a generic method 
that allows tags to be incorporated to the standard collabo- 
rative filtering, via reducing the ternary correlations to three 
binary correlations and then applying a fusion method to re- 
associate these correlations. Chi et al. [5] presented a model 
considering probabilistic polyadic factorization for person- 
alized recommendation. Shepitsen et al. Q15] proposed a 
tag clustering-based method to improve the algorithmic ac- 
curacy. Zhang et al. [23] presented a diffusion-based hy- 
brid algorithm for personalized recommendation in CTSes. 
Shang et al. [18] proposed a hybrid collaborative filtering 
algorithm on user-item-tag tripartite graphs. 



3. ALGORITHM AND EXPERIMENTS 

In this paper, we adopt a weighted variant of diffusion-based 
method proposed in [25] , where the weights are given ac- 
cording to personal vocabulary in CTSes. A CTS consists 
of three sets, for users U = {Ui,U2,- ■ ■ ,U„}, items / = 
{h,h,- ■ ■ ,Im}, and tags T = {Ti,T 2 / • • ,T S }, respectively. 
Actually, it is easy to understand that different users may 
consider differently for the same item, and such difference 
can be characterized to some extent by looking into the dif- 
ferent usage pattens of tags. Although those tags are freely 
given, people are supposed to give their most favorite words 
to describe their best collections. A latent assumption is 
that the more frequently a user uses a tag, the more likely 
the user likes this tag as well as the items labeled with it. 
On the other hand, users are not willing to give too many 
tags for a single item. 



3.1 Algorithm 

In this subsection, we introduce a simple way that utilizes 
the tag information to provide better recommendations. As 
mentioned above, we will consider two factors: (i) the fre- 
quency of each tag used by each user; (ii) the number of 
tags assigned with a single item. Since our aim is to find the 
most relevant items for a particular user, so-called person- 
alized recommendation, we will describe our algorithm for a 
target user Ui. The algorithm can be expressed in following 
steps: 

Step 1: Define the initial value vector / for all the items, 



whose element reads: 



h = 



E, m =i y! s zi m. 



|Ty| 

E K ^ 



(1) 



where \Tij | denotes the number of tags that Ui has assigned 
to item Ij, and K(ti S ) is the number of times tag t 3 has been 
used by Ui. 

Step 2: Distribute the value of each item evenly to the users 
who collect it, then the value a user Ui will receive reads: 



E 



fi 



(2) 



where T(Ui) denotes the set of items collected by Ui, and 
d(Ij) is the degree of Ij in the user-item bipartite graph. 

Step 3: Redistribute the value of each user Ui to his/her 
collections according to the weight defined in Step 1. Then 
the final value vector /' of items will be summarized as: 



/;=e 



E 



where \U^\ is the number of users collected item Ij. 



(3) 



The above procedure constitutes of a mutual reinforcement 
process that allows the values transferred between users and 
items. At the first step, we highlight the items selected by 
Ui and assign each of them with an initial value according 
to Ui 's tagging activities. Step 2 transfers values from items 
to users. In Step 3, we consider the personal vocabulary 
again and distribute the values to items, which generates 
final score for each Item. Finally, we sort these scores in a 
descending order, and the top items having not been col- 
lected by Ui will be the recommended to Ui. 

In CTSes, different individuals have different sizes of vocab- 
ulary, and each tag may take different significance. Some 
tags are frequently used while some others are seldom picked. 
Those frequently used tags should be of higher importance 
in the user's viewpoint. If the user applies those frequently 
used tags to a specific item, it would indicate that this user 
prefers it to some other items assigned with infrequently 
used tags. Similar phenomenon also widely exists in our 
daily life, one can imagine that people are willing to illus- 
trate a question using their familiar words. In addition, the 
number of tags assigned to an item represents how willing 
the user likes to describe it. By aggregating the fractions 
of all the tags labeling a specific item, one can estimate the 
importance of this item. 

3.2 Data Set 

We use a benchmark dataset, Del.icio.us, to evaluate the 
proposed algorithm. Del.icio.us is one of the most popular 
social bookmarking web sites, which allows users not only to 
store and organize personal bookmarks (URLs), but also to 
look into other users' collections and find what they might be 
interested in by simply keeping track of the pools with same 
tags or items. The data used in this paper is crawled from 
the website http://del.icio.us/ in May 2008. We guarantee 
that each user has collected at least one item, each item has 
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Figure 1: Precision versus the length of recommen- 
dation list. The results reported here are averaged 
over 10 independent runs, each of which corresponds 
to a random division of training set and testing set. 



Figure 2: Recall versus the length of recommenda- 
tion list. The results reported here are averaged 
over 10 independent runs, each of which corresponds 
to a random division of training set and testing set. 



been collected by at least two users, and assigned by at least 
one tag. Table 1 summarizes the basic information of the 
data set. 

3.3 Experimental Results 

To test the algorithmic performance, the data set is ran- 
domly divided into two parts: the training set, which is 
used as known information, contains 95% of entries, and the 
remaining 5% of entries constitute the testing set. We em- 
ploy three metrics to characterize the algorithmic accuracy: 
Precision, Recall and Fl, which are defined as follows [8]: 



J2 - N r 

Precision = — L -r~ t (4) 
nL 

where n is the number of users, L is the length of recom- 
mendation list, and N£ is the number of recovered items in 
the recommendations for user [/;. 



Recall = ~" J. , (5) 

where Np is the number of items collected by user Ui in the 
testing set. 



2 * Precision * Recall 
Precision + Recall 

Figure 1, Figure 2 and Figure 3 show the experimental re- 
sults of Precision, Recall and Fl respectively. Since the typi- 
cal length for recommendation list is tens, our experimental 
study focuses on the interval L £ [10,100]. For compari- 
son, we choose the method described in [25] as the baseline 
algorithm. It can be seen that our proposed algorithm con- 
sidering the personal vocabulary significantly outperforms 
the baseline method in all the three measurements. 

4. CONCLUSION AND DISCUSSION 



In this paper, we proposed an tag-based algorithm that takes 
into account the personal vocabulary. Our algorithm is 
based on the following hypotheses: (i) Tags assigned to a 
certain item by a particular user represent personal tastes 
of it. Even for the same item, different individuals may give 
different tags, (ii) Different tags plays different roles for the 
same user. The frequency of tags might suggest the per- 
sonal preferences: the higher the frequency, the more the 
user likes it. Experimental results demonstrate that the us- 
age of tag information can significantly improve accuracy of 
personalized recommendations. 

Recently, the collaborative tagging systems have attracted 
more and more attention both in the scientific and engineer- 
ing worlds [UE3]- A great number of publications and web 
applications have discussed/adopted tagging functions. Our 
experimental results show that tags can be used to not only 
assist personal resources organizing, but also help to filter 
out mass information. This paper only provides a simple 
way to consider the use of tags, and a couple of open issues 
remain for future study. From the perspective of human dy- 
namics, the rank of tags within a single collection and the 
time the user chooses tags could also be taken into account. 
In addition, the hypergraph [6] description is a promising 
tool to exploit a comprehensive view of CTSes and bring us 
an in-depth understanding to the structure and evaluation 
of CTSes. 
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